AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Neural Information Processing SystemsFeb-19-2026, 16:00:39 GMT

RecursiveReinforcementLearning

Name

Recursion is the fundamental paradigm to finitely describe potentially infinite objects.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States > Washington > King County > Seattle (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Neural Information Processing SystemsFeb-18-2026, 07:00:29 GMT

Reinforcement Learning with L TL and ω-Regular Objectives via Optimality-Preserving Translation to Average Rewards Xuan-Bach Le

In this work, we study the relationship between these objectives.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
(11 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-18-2026, 02:22:08 GMT

Reward Machines for Deep RL in Noisy and Uncertain Environments

abstraction model, logic & formal reasoning, machine learning, (21 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Transportation > Ground > Road (0.47)
Information Technology (0.46)
Government (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Neural Information Processing SystemsFeb-12-2026, 04:38:21 GMT

LearningRewardMachinesforPartially ObservableReinforcementLearning

RL agents learn policies from experience.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wagner, Dominik, Witzman, Leon, Ong, Luke

Reinforcement Learning with $ω$-Regular Objectives and Constraints

arXiv.org Artificial IntelligenceNov-26-2025

Reinforcement learning (RL) commonly relies on scalar rewards with limited ability to express temporal, conditional, or safety-critical goals, and can lead to reward hacking. Temporal logic expressible via the more general class of $ω$-regular objectives addresses this by precisely specifying rich behavioural properties. Even still, measuring performance by a single scalar (be it reward or satisfaction probability) masks safety-performance trade-offs that arise in settings with a tolerable level of risk. We address both limitations simultaneously by combining $ω$-regular objectives with explicit constraints, allowing safety requirements and optimisation targets to be treated separately. We develop a model-based RL algorithm based on linear programming, which in the limit produces a policy maximising the probability of satisfying an $ω$-regular objective while also adhering to $ω$-regular constraints within specified thresholds. Furthermore, we establish a translation to constrained limit-average problems with optimality-preserving guarantees.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2511.19849

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Varricchione, Giovanni, Klassen, Toryn Q., Alechina, Natasha, Dastani, Mehdi, Logan, Brian, McIlraith, Sheila A.

Pushdown Reward Machines for Reinforcement Learning

arXiv.org Artificial IntelligenceNov-13-2025

Reward machines (RMs) are automata structures that encode (non-Markovian) reward functions for reinforcement learning (RL). RMs can reward any behaviour representable in regular languages and, when paired with RL algorithms that exploit RM structure, have been shown to significantly improve sample efficiency in many domains. In this work, we present pushdown reward machines (pdRMs), an extension of reward machines based on deterministic pushdown automata. pdRMs can recognise and reward temporally extended behaviours representable in deterministic context-free languages, making them more expressive than reward machines. We introduce two variants of pdRM-based policies, one which has access to the entire stack of the pdRM, and one which can only access the top $k$ symbols (for a given constant $k$) of the stack. We propose a procedure to check when the two kinds of policies (for a given environment, pdRM, and constant $k$) achieve the same optimal state values. We then provide theoretical results establishing the expressive power of pdRMs, and space complexity results for the proposed learning problems. Lastly, we propose an approach for off-policy RL algorithms that exploits counterfactual experiences with pdRMs. We conclude by providing experimental results showing how agents can be trained to perform tasks representable in deterministic context-free languages using pdRMs.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

2508.06894

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Levina, Kristina, Pappas, Nikolaos, Karapantelakis, Athanasios, Feljan, Aneta Vulgarakis, Seipp, Jendrik

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

arXiv.org Artificial IntelligenceNov-3-2025

Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm that leverages coupled RMs: Q-learning with coupled RMs (CoRM). Our experiments show that CoRM scales better than state-of-the-art RM algorithms for long-horizon problems with unordered subtasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2510.27329

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Castanyer, Roger Creus, Mohamed, Faisal, Castro, Pablo Samuel, Neary, Cyrus, Berseth, Glen

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

arXiv.org Artificial IntelligenceOct-23-2025

Reinforcement learning (RL) algorithms are highly sensitive to reward function specification, which remains a central challenge limiting their broad applicability. We present ARM-FM: Automated Reward Machines via Foundation Models, a framework for automated, compositional reward design in RL that leverages the high-level reasoning capabilities of foundation models (FMs). Reward machines (RMs) -- an automata-based formalism for reward specification -- are used as the mechanism for RL objective specification, and are automatically constructed via the use of FMs. The structured formalism of RMs yields effective task decompositions, while the use of FMs enables objective specifications in natural language. Concretely, we (i) use FMs to automatically generate RMs from natural language specifications; (ii) associate language embeddings with each RM automata-state to enable generalization across tasks; and (iii) provide empirical evidence of ARM-FM's effectiveness in a diverse suite of challenging environments, including evidence of zero-shot generalization.

large language model, machine learning, reinforcement learning, (17 more...)

2510.14176

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Donnelly, Daniel, Ferrando, Angelo, Belardinelli, Francesco

Expressive Reward Synthesis with the Runtime Monitoring Language

arXiv.org Machine LearningOct-22-2025

A key challenge in reinforcement learning (RL) is reward (mis)specification, whereby imprecisely defined reward functions can result in unintended, possibly harmful, behaviours. Indeed, reward functions in RL are typically treated as black-box mappings from state-action pairs to scalar values. While effective in many settings, this approach provides no information about why rewards are given, which can hinder learning and interpretability. Reward Machines address this issue by representing reward functions as finite state automata, enabling the specification of structured, non-Markovian reward functions. However, their ex-pressivity is typically bounded by regular languages, leaving them unable to capture more complex behaviours such as counting or parametrised conditions. In this work, we build on the Runtime Monitoring Language (RML) to develop a novel class of language-based Reward Machines. By leveraging the built-in memory of RML, our approach can specify reward functions for non-regular, non-Markovian tasks. We demonstrate the expressiveness of our approach through experiments, highlighting additional advantages in flexible event-handling and task specification over existing Reward Machine-based methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2510.16185

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)